--- title: Tutorial keywords: fastai sidebar: home_sidebar summary: "The goal of this challenge is to find all instances of dolphins in a picture and then color pixes of each dolphin with a unique color." description: "The goal of this challenge is to find all instances of dolphins in a picture and then color pixes of each dolphin with a unique color." nb_path: "notebooks/tutorial/00_DolphinsTutorial.ipynb" ---
try:
import dolphins_recognition_challenge
except Exception:
if "google.colab" in str(get_ipython()):
print("Running on CoLab")
!pip install dolphins-recognition-challenge
%load_ext autoreload
%autoreload 2
import numpy as np
import PIL
from PIL import Image
import torch
import torchvision
import pandas as pd
import seaborn as sns
We start by downloading and visualizing the dataset containing 200 photographs with one or more dolphins split into a training set containing 160 photographs and a validation set containing 40 photographs.
from dolphins_recognition_challenge.datasets import get_dataset, display_batches
data_loader, data_loader_test = get_dataset("segmentation", batch_size=3)
display_batches(data_loader, n_batches=2)
In order to prevent overfitting which happens when the dataset size is too small, we perform a number of transformations to increase the size of the dataset. One transofrmation implemented in the Torch vision library is RandomHorizontalFlip and we will implemented MyColorJitter which is basically just a wrapper around torchvision.transforms.ColorJitter class. However, we cannot use this class directly without a wrapper because a transofrmation could possibly affect targets and not just the image. For example, if we were to implement RandomCrop, we would need to crop segmentation masks and readjust bounding boxes as well.
class MyColorJitter:
def __init__(self, brightness=0.5, contrast=0.5, saturation=0.5, hue=0.5):
self.torch_color_jitter = torchvision.transforms.ColorJitter(
brightness=brightness, contrast=contrast, saturation=saturation, hue=hue
)
def __call__(self, image, target):
image = self.torch_color_jitter(image)
return image, target
We will make a series of transformations on an image and we will combine all those transofrmations in a single one as follows:
from dolphins_recognition_challenge.datasets import ToTensor, ToPILImage, Compose, RandomHorizontalFlip
def get_tensor_transforms(train):
transforms = []
# converts the image, a PIL image, into a PyTorch Tensor
transforms.append(ToTensor())
if train:
# during training, randomly flip the training images
# and ground-truth for data augmentation
transforms.append(
MyColorJitter(brightness=0.5, contrast=0.5, saturation=0.5, hue=0.5)
)
transforms.append(RandomHorizontalFlip(0.5))
# TODO: add additional transforms: e.g. random crop
return Compose(transforms)
With data augementation defined, we are ready to generate the actual datasets used for training our models.
batch_size = 4
data_loader, data_loader_test = get_dataset(
"segmentation", get_tensor_transforms=get_tensor_transforms, batch_size=batch_size
)
display_batches(data_loader, n_batches=4)
{% include tip.html content='incorporate more transformation classes such as RandomCrop etc. (https://pytorch.org/docs/stable/torchvision/transforms.html)' %}
We can reuse already trained models for instance segmentation trained on other dataset and finetune it for our particular problem, in our case on dataset with dolphins.
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
from torchvision.models.detection.mask_rcnn import MaskRCNNPredictor
def get_instance_segmentation_model(hidden_layer_size, box_score_thresh=0.5):
# our dataset has two classes only - background and dolphin
num_classes = 2
# load an instance segmentation model pre-trained on COCO
model = torchvision.models.detection.maskrcnn_resnet50_fpn(
pretrained=True,
box_score_thresh=box_score_thresh,
)
# get the number of input features for the classifier
in_features = model.roi_heads.box_predictor.cls_score.in_features
# replace the pre-trained head with a new one
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)
# now get the number of input features for the mask classifier
in_features_mask = model.roi_heads.mask_predictor.conv5_mask.in_channels
model.roi_heads.mask_predictor = MaskRCNNPredictor(
in_channels=in_features_mask,
dim_reduced=hidden_layer_size,
num_classes=num_classes
)
return model
Before using a model constructed, we should move it to appropriate device. We will test if we have GPU available and move it to there if possible.
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
# get the model using our helper function
model = get_instance_segmentation_model(hidden_layer_size=256)
# move model to the right device
model.to(device)
# construct an optimizer
params = [p for p in model.parameters() if p.requires_grad]
optimizer = torch.optim.SGD(params, lr=0.005, momentum=0.9, weight_decay=0.0005)
# and a learning rate scheduler which decreases the learning rate by
# 10x every 3 epochs
lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.1)
We have implemented a function for training a model for one epoch - meaning using each image from the training dataset exactly once. Let's train for one epochs an see what predictions we make before and after that.
data_loader, data_loader_test = get_dataset(
"segmentation",
batch_size=4,
get_tensor_transforms=get_tensor_transforms,
n_samples=8,
)
data_loader, data_loader_test = get_dataset(
"segmentation", get_tensor_transforms=get_tensor_transforms, batch_size=batch_size
)
from dolphins_recognition_challenge.instance_segmentation.model import train_one_epoch
from dolphins_recognition_challenge.instance_segmentation.model import show_predictions
show_predictions(model, data_loader=data_loader_test, n=1, score_threshold=0.5)
num_epochs = 1
for epoch in range(num_epochs):
# train for one epoch, printing every 10 iterations
train_one_epoch(model, optimizer, data_loader, device, epoch=epoch, print_freq=20)
train_one_epoch(model, optimizer, data_loader, device, epoch=1, print_freq=20)
show_predictions(model, data_loader=data_loader_test, n=1, score_threshold=0.5)
Now we can fully train the model for more epochs, in this case for 20 more.
num_epochs = 20
data_loader, data_loader_test = get_dataset(
"segmentation", batch_size=4, get_tensor_transforms=get_tensor_transforms
)
for epoch in range(1, num_epochs):
# train for one epoch, printing every 10 iterations
train_one_epoch(model, optimizer, data_loader, device, epoch=epoch, print_freq=20)
lr_scheduler.step()
Visualise few samples and print the IOU metric for those samples
from dolphins_recognition_challenge.instance_segmentation.model import show_prediction, iou_metric_example
for i in range(4):
iou_test_image = iou_metric_example(model, data_loader_test.dataset[i], 0.5)
img, _ = data_loader_test.dataset[i]
print(f"IOU metric for the input image is: {iou_test_image}")
show_prediction(model, img, width=820)
Calculate the mean IOU metric for the entire data set
%%time
from dolphins_recognition_challenge.instance_segmentation.model import iou_metric, show_predictions_sorted_by_iou
mean_iou_testset, _ = iou_metric(model, data_loader_test.dataset)
print(f"Mean IOU metric for the test set is: {mean_iou_testset}")
...
show_predictions_sorted_by_iou(model, data_loader_test.dataset)
Here we can see how to use the submit_model function. We must pass trained model, an alias that will be displayed on the leaderboard, name and email. Returns the path to the zipped file.
from dolphins_recognition_challenge.submissions import submit_model
zip_fname = submit_model(model, alias="dolphin123", name="Name Surname", email="name.surname@gmail.com")
Here we can check what is in the zip file. The zip file contains the model and 2 csv files. The first CSV file contains the iou metrics for each image from the validation set, and the second file contains information about the competitor.
!unzip -vl "{zip_fname}"